On Evaluating and Comparing Conversational Agents
نویسندگان
چکیده
Conversational agents are exploding in popularity. However, much work remains in the area of non goal-oriented conversations, despite significant growth in research interest over recent years. To advance the state of the art in conversational AI, Amazon launched the Alexa Prize, a 2.5-million dollar university competition where sixteen selected university teams built conversational agents to deliver the best social conversational experience. Alexa Prize provided the academic community with the unique opportunity to perform research with a live system used by millions of users. The subjectivity associated with evaluating conversations is key element underlying the challenge of building non-goal oriented dialogue systems. In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human judgement. The proposed metrics provide granular analysis of the conversational agents, which is not captured in human ratings. We show that these metrics can be used as a reasonable proxy for human judgment. We provide a mechanism to unify the metrics for selecting the top performing agents, which has also been applied throughout the Alexa Prize competition. To our knowledge, to date it is the largest setting for evaluating agents with millions of conversations and hundreds of thousands of ratings from users. We believe that this work is a step towards an automatic evaluation process for conversational AIs.
منابع مشابه
A Black-box Approach for Response Quality Evaluation of Conversational Agent Systems
The evaluation of conversational agents or chatterbots question answering systems is a major research area that needs much attention. Before the rise of domain-oriented conversational agents based on natural language understanding and reasoning, evaluation is never a problem as information retrieval-based metrics are readily available for use. However, when chatterbots began to become more doma...
متن کاملEvaluating Embodied Conversational Agents in Collaborative Virtual Environments
There are currently no evaluation methods specific to ECAs in CVEs and traditional evaluation methods are limited in their applicability and consequently unlikely to address the full range of aspects now inherent in such systems. We argue that a combination of controlled experimentation, quasi-experiments, review-based evaluation and heuristic expert reviews is needed. To operationalise these t...
متن کاملScripting and Evaluating Affective Interactions with Embodied Conversational Agents
This paper describes the results obtained and ongoing agenda of a research project on embodied conversational agents, carried out at the University of Tokyo. The main focus points of the project are the development of scripting languages for controlling life-like agents and the modeling of affective interactions between agents and human users. Furthermore, the project aims at evaluating the imp...
متن کاملSocial Dialogue with Embodied Conversational Agents
The functions of social dialogue between people in the context of performing a task is discussed, as well as approaches to modelling such dialogue in embodied conversational agents. A study of an agent’s use of social dialogue is presented, comparing embodied interactions with similar interactions conducted over the phone, assessing the impact these media have on a wide range of behavioural, ta...
متن کاملReflections on Jennifer Saul's View of Successful Communication and Conversational Implicature
Saul (2002) criticizes a view on the relationship between speaker meaning and conversational implicatures according to which speaker meaning is exhaustively comprised of what is said and what is implicated. In the course of making her points, she develops a couple of new notions which she calls “utterer-implicature” and “audience-implicature”. She then makes certain claims about the relationshi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1801.03625 شماره
صفحات -
تاریخ انتشار 2018